This week, you are tasked with constructing a network graph from a small collection of historical letters.
You’ve just come back from a research trip to the National Archives in the UK, as part of your project on spy networks operating between Britain and the Dutch Republic in the seventeenth century. While there you took down details of some potentially interesting letters sent in the summer of 1666, during the second Anglo-Dutch war, many written by the English playwright and spy Aphra Behn, plus a few others that looked like they might be related. You’re hoping the content can in some way tell you about the key players involved in secret intelligence during the war.
The information seems ideally suited to constructing a network graph of some kind.
A network graph (as you learned in this week’s lecture) is a representation of things and their relations to other things. These things (we call them entities, or nodes in network-science speak) can be of many types: people, places, institutions, or even words or concepts. The relations (called edges) can also be of different types—for example ‘brother of’ or ‘employer’ in a social network.
To construct and analyse a network from this historical data you need to carry out a number of steps:
Once you have done this, you’ll write a short report on your findings, critically reflecting on the method, the tool, and its application to your own area of academic interest.
On your research trip you made detailed notes of 22 letters, noting down people, dates, people mentioned, and locations, as well as copying a brief description of each letter, written by the archive cataloguers.
All of this letter information is available in a tab called ‘data’ at the top of the screen. You can also download a spreadsheet, which can be opened in Excel/Open Office/Google Sheets containing the same information by right-clicking this link and selecting ‘Save Link As..’ (in Google Chrome) or ‘Download Linked File As..’ (Safari).
The first decision you need to make is which pieces of information will make up the nodes and edges of your network. Some options include:
There is no right or wrong answer: almost anything can be represented as a network. However, some pieces of information and combinations of information will be easier to intepret than others.
A crucial step in any data analysis using humanities data is data cleaning. This is the process of correcting and standardising your data so that it can be properly ‘understood’ by computer programs (which are not as forgiving as a human reader).
As you were in a rush during your visit to the archives, you made a number of mistakes or inconsistencies. These might include:
These need to be fixed before constructing your network, as otherwise they will affect the results. The network application will treat spelling variations as separate nodes, for example.
Once you have decided on your data type to turn into a network, next it needs to be extracted in a format suitable for input to a network analysis application.
The simplest format for network analysis is what is known as an edge list. This is a list containing two columns: first, the Source node and second, the Target node. An edge (the line between points in a network diagram) will be drawn between each pair. The edge list can be directed, meaning that the flow of information goes from the Source to the Target node, and not necessarily the other way around.
The application we’re going to use takes as its input one connection per line, separated by a comma.
So for example, if you decide to draw a network of connected people:
To, From.Once you have finished, you can move on the the application below.
Constructing a network of people mentioned in the descriptions of the letters is a special case. To make a network of people mentioned by a certain author, you’ll need to give each person mentioned a separate line in the edge list.
Each line in your document will contain first an author name, and second, a single ‘person mentioned’. If the description contains more than one person, you’ll need to repeat the author name for each person mentioned name.
So for example these two documents contain a number of people mentioned in each:
[Lord Arlington] to Williamson. Sends a relation for the King, thinking he may not have had so particular a one. Wants an express to-morrow on his way. Asks if His Majesty is going down to the coast. Sir Thos. Clifford’s letter should be called for when the King has done with it, that it be not lost. With the postal envelope, ordering the letter to be immediately dispatched.
Jas. Halsall to George Porter, stone gallery, Whitehall. Mrs. Carter, the poor woman they met yesterday, formerly lived with Mrs. Abbott, and preserved in her house Tom Blague, Robin Killigrew, Sir Rob. Shirley, Mr. O’Neale, Nic. Armorer, Lord Rochester, himself, and many others of the King’s servants; she was their confidant and very faithful, and is now ready to starve; it would be a charity to get her into a hospital, or find her some means to live
To correctly represent these as a network, the document should look like the following:
Note that this is a special type of network of two different types of nodes (letter-writers connected to people mentioned), known as bi-partite, and as such many of the metrics calculated using the web application will be more difficult to interpret. It is still worthwhile, particularly for the visualisations produced.
The tool you’re going to use also allows for the input of an edge weight: for example, if you chose to construct a network of connected places (using the origin and destination information), the weight might be the number of times the two places are connected in the data.
To do this, simply count up the repeated values, and add the weight as a number, again separated by a comma:
If you choose to use a weight, you’ll need to enter a value of 1 in any rows where you don’t have multiple instances.
Once you have uploaded your network using the ‘calculate’ button, you’ll be presented with a range of basic network metrics for each node in your data. These all in some way measure the node’s importance, or centrality to the network in question.
This centrality can be calculated in different ways.
Degree simply counts the total number of connections for each node. In the following network, Aphra Behn has a degree of 4, as she is directly connected to four other nodes.
Betweenness centrality is a measurement of how important a particular node is in the flow of information between different parts of the network. It is calculated by calculating each of the ‘shortest paths’ through the network: the shortest number of ‘hops’ between any two nodes. Nodes which appear on many of these paths score highly for betweenness centrality.
In the network diagram above, Henry Bennet, Earl of Arlington is used on the path from any node on the right to any on the left, such as that from John Tippets to William Scott (highlighted in red). After Aphra Behn—the most connected node—he scores the highest for betweenness centrality.
Eigenvector centrality calculates centrality to a network based on a node’s closeness to other important nodes. Individuals with a high eigenvector centrality often ‘have the ear’ of important people and wield influence in that way.
For the assignment alongside this task, you are asked to put together a short report (c. 500 words) reflecting on the task carried out above. This can (and ideally should) contain multimedia objects in the form of screenshots, tables of data, and so forth. In it you should:
All the letter data is in the table below; you can dsiplay all of it by using the drop-down menu at the top-left, or click through individual pages at the bottom of the screen.
Alternatively you can download a spreadsheet containing the same information by right-clicking this link and selecting ‘Save Link As..’ (in Google Chrome) or ‘Download Linked File As..’ (Safari).
If you’ve followed the steps in the exercise correctly, you should now have an open text document containing an edge list. Enter it into the text box in the web application below, decide on whether your network should be weighted/unweighted, and click ‘Calculate’.